Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some bugs in TypeCoercion rule #3407

Merged
merged 8 commits into from
Sep 12, 2022

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Sep 8, 2022

Which issue does this PR close?

Part of #3390

Rationale for this change

This PR fixes a number of bugs that I discovered when running a popular SQL benchmark. It does not fix all of them.

What changes are included in this PR?

  • Fix schema bug that affected queries with more than 2 inputs (union, intersect, etc)
  • Add type coercion for BETWEEN
  • Add workaround for a bug with + INTERVAL
  • Add coercion support for Date32 and Date64

Are there any user-facing changes?

No

@github-actions github-actions bot added the optimizer Optimizer rules label Sep 8, 2022
@andygrove andygrove changed the title Fix schema bug in TypeCoercion rule Improve TypeCoercion rule Sep 8, 2022
@andygrove andygrove marked this pull request as draft September 8, 2022 23:39
@github-actions github-actions bot added the core Core DataFusion crate label Sep 9, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2022

Codecov Report

Merging #3407 (f81fb7c) into master (73447b5) will increase coverage by 0.13%.
The diff coverage is 81.96%.

@@            Coverage Diff             @@
##           master    #3407      +/-   ##
==========================================
+ Coverage   85.49%   85.63%   +0.13%     
==========================================
  Files         296      296              
  Lines       54331    54512     +181     
==========================================
+ Hits        46448    46679     +231     
+ Misses       7883     7833      -50     
Impacted Files Coverage Δ
datafusion/expr/src/binary_rule.rs 84.59% <50.00%> (+0.08%) ⬆️
datafusion/optimizer/src/type_coercion.rs 90.90% <72.97%> (-8.06%) ⬇️
datafusion/optimizer/tests/integration-test.rs 88.88% <100.00%> (+4.88%) ⬆️
datafusion/physical-expr/src/planner.rs 92.68% <0.00%> (-0.87%) ⬇️
datafusion/sql/src/planner.rs 81.06% <0.00%> (-0.62%) ⬇️
datafusion/optimizer/src/reduce_outer_join.rs 98.19% <0.00%> (-0.61%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 77.19% <0.00%> (ø)
datafusion/proto/src/lib.rs 93.85% <0.00%> (+0.33%) ⬆️
datafusion/expr/src/expr_schema.rs 63.47% <0.00%> (+0.59%) ⬆️
... and 8 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@andygrove andygrove marked this pull request as ready for review September 9, 2022 00:46
@andygrove andygrove added the bug Something isn't working label Sep 9, 2022
@github-actions github-actions bot removed the core Core DataFusion crate label Sep 9, 2022
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Sep 9, 2022
@andygrove andygrove changed the title Improve TypeCoercion rule Fix some bugs in TypeCoercion rule Sep 9, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I had a small question about casting Date32 -> date64 but otherwise this looks like a nice improvement to me

)?),
_ => DFSchemaRef::new(DFSchema::empty()),
};
// get schema representing all available input fields. This is used for data type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

DataType::Date32 | DataType::Date64 | DataType::Timestamp(_, _),
&DataType::Interval(_),
) => {
// Arrow `can_cast_types` says we cannot cast an Interval to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to cast an interval to a specific point in time -- aka I think DataFusion's coerce_types is incorrect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am tracking this as part of #3419

let expr_type = expr.get_type(&self.schema)?;
let low_type = low.get_type(&self.schema)?;
let coerced_type = comparison_coercion(&expr_type, &low_type)
.ok_or_else(|| DataFusionError::Internal("".to_string()))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the empty error message? That seems like it may be confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops .. forgot to go back and finish this. I have pushed a fix.

@@ -244,6 +283,34 @@ mod test {
Ok(())
}

#[test]
fn binary_op_date32_add_interval() -> Result<()> {
//CAST(Utf8("1998-03-18") AS Date32) + IntervalDayTime("386547056640")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@liukun4515
Copy link
Contributor

I will review tomorrow, please hold it

@andygrove andygrove mentioned this pull request Sep 9, 2022
4 tasks
@alamb
Copy link
Contributor

alamb commented Sep 12, 2022

I will review tomorrow, please hold it

@liukun4515 do you still want us to hold this PR for your review?

@andygrove
Copy link
Member Author

I'd prefer that we get these bug fixes merged to unblock some other items. Arrow is still following the "Commit-Then-Review" (C-T-R) policy:

A policy governing code changes which permits developers to make changes at will, with the possibility of being retroactively [vetoed](http://www.apache.org/foundation/glossary.html#Veto)

In other words, we can continue with the review after merge, and revert the PR if it proves to be problematic.

@alamb
Copy link
Contributor

alamb commented Sep 12, 2022

I agree -- let's merge this PR and we can either revert it or address comments in a follow on PR if/when provided by @liukun4515

Thanks all!

@alamb alamb merged commit 7427a80 into apache:master Sep 12, 2022
@andygrove andygrove deleted the fix-type-coercion-bugs branch September 12, 2022 17:08
@ursabot
Copy link

ursabot commented Sep 12, 2022

Benchmark runs are scheduled for baseline = 3e56a0f and contender = 7427a80. 7427a80 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants